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ABSTRACT 



A sy stem and method for content-based search and retrieval 
ol visual objects". A base visual information retrieval (VIR) 
engine utilizes a set of universal primitives to operate on the 
visual objects. An extensible VIR engine allows custom, 
modular primitives to be defined and registered. A custom 
primitive addresses domain specific problems and can utilize 
any image understanding technique. Object attributes can be 
extracted over the entire image or over only a portion of the 
object. A schema is defined as a specific collection of 
primitives. A specific schema implies a specific set of visual 
features to be processed and a corresponding feature vector 
to be used for content-based similarity scoring. A primitive 
registration interface registers custom primitives and facili- 
tates storing of an analysis function and a comparison 
function to a schema table. A heterogeneous comparison 
allows objects analyzed by different schemas to be com- 
pared if at least one primitive is in common between the 
schemas. A threshold-based comparison is utilized to 
improve performance of the VIR engine. Adistance between 
two feature vectors is computed in any of the comparison 
processes so as lo generate a similarity score. 

10 Claims, 16 Drawing Sheets 
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THRESHOLD-BASED COMPARISON however, focussed either on only a small aspect of the 

problem, such as data structures or pictorial queries, or on a 

RELATED APPLICATIONS very narrow application, such as databases for pottery 

- . . c articles of a particular tribe. Other researchers have devel- 

This application claims the benefit of the filing date of $ oped processing shells wbich use images ^ 

U.S. patent application Ser. No. 60/014,893, filed Mar. 29, Clearly, visual information management systems encompass 

1996, for "SIMILARITY ENGINE FOR CONTENT- no fonly databases, but aspects of image processing and 

BASED RETRIEVAL OF OBJECTS", to Jain, et al. image understanding, very sophisticated interfaces, 

knowledge-based systems, compression and decompression 

MATERIAL SUBJECT TO COPYRIGHT of images. Moreover, memory management and oreaniza- 

rRwiLLiiuiN Uon start becoming much more serious than in the 

A portion of the disclosure of this patent document lar 8 est alphanumeric databases, 

contains material which is subject to copyright protection. A significant event in the world of information systems in 

The copyright owner has no objection to the facsimile the past few years is the development of multimedia infor- 

reproduction by anyone of the patent document or the patent 15 mation systems. A multimedia information system goes 

disclosure, as it appears in the Patent and Trademark Office beyond traditional database systems to incorporate various 

patent file or records, but otherwise reserves all copyright modes of non-textual digital data, such as digitized images 

rights whatsoever. ail d videos, in addition to textual information. It allows a 

user the same (or better) ease of use and flexibility of storage 

BACKGROUND OF THE INVENTION 2Q and access as traditional database systems. Today, thanks to 

j p- u r t i f .* an ever-increasing number of application areas like stock 

1. Held 01 the Invention , . , j- , - . . .. , 

photography, medical imaging, digital video production, 

The present invention relates to visual information document imaging and so forth, gigabytes of image and 

retrieval systems. More specifically, the invention is directed vide0 iaformation ar e being produced every day. The need 

to an extensible system for retrieval of stored visual objects to handle this m f ormat iori has resulted in new technological 

based on similarity of content to a target visual object. requirements and challenges: 

2. Description of the Related Technology [ mage and video data are much more voluminous than 
One of the most important technologies needed across text, and need supporting technology for rapid and 

many traditional and emerging applications is the manage- efficient storage and retrieval. 

ment of visual information. Every day we are bombarded 30 There are several different modes in which a user would 
with information presented in the form of images. So search for, view, and use images and videos, 
important are images in our world of information Even if multimedia information resides on different corn- 
technology, that we generate literally millions of images puters or locationSj it should easily be available to the 
every day, and this number keeps escalating with advances 

in imaging, visualization, video, and computing technolo- 35 representation, storage, retrieval, visualization and 

S ies - distribution of multimedia information is now a central 

It would be impossible to cope with this explosion of theme both in the academic community and industry alike, 

image information, unless the images were organized for What is needed is a capability to manage this information, 

rapid retrieval on demand. Asimilar situation occurred in the i n traditional database systems, users search images by 

past for numeric and other structured data, and led to the 40 keywords or descriptions associated with the visual infor- 

creation of computerized database management systems. In mation. In a traditional database management system 

these systems, large amounts of data are organized into fields (DBMS), an image is treated as a file name, or the raw image 

and important or key fields are used to index the databases data exists as a binary large object (BLOB). The limitation 

making search very efficient. These information manage- jg c i ea r: a file name or the raw image data is useful for 

ment systems have changed several aspects of the modern 45 displaying the image, but not for describing it. In some 

society. These systems, however, are limited by the fact that applications, these shortcomings were overcome by having 

they work well only with numeric data and short alpha- a person participate in the process by interpreting and 

numeric strings. Since so much information is in non- assigning keyword descriptions to images. However, textual 

alphanumeric form (such as images, video, speech), to deal descriptors such as a set of keywords are also inadequate to 

with such information, researchers started exploring the 50 describe an image, simply because the same image might be 

design and implementation of visual databases. But creation described in different ways by different people. What is 

of mere image repositories is of little value unless there are needed is a new multimedia information system technology 

methods for fast retrieval of objects such as images based on model such as a visual information management system 

their content, ideally with an efficiency that we find in (VIMSYS) model. Unlike traditional database systems, this 

today's databases. One should be able to search visual 55 model recognizes that most users prefer to search image and 

databases with visual-based queries, in addition to alphanu- video information by what the image or video actually 

meric queries. The fundamental problem is that images, contains, rather than by keywords or descriptions associated 

video and other similar data differ from numeric data and with the visual information. The only proper method by 

text in format, and hence they require a totally different which the user can get access to the content of the image is 

technique of organization, indexing, and query processing. 60 by using image-analysis technology to extract the content 

One needs to consider the issues in visual information f rom an image or video. Once extracted, the content repre- 

management, rather than simply extending the existing sen ts most of what the user needs in order to organize, 

database technology to deal with images. One must treat search, and locate necessary visual information, 

images as one of the central sources of information rather This breakthrough concept of content extraction alleviates 

than as an appendix to the main database. 65 several technological problems. The foremost benefit is that 

A few researchers have addressed problems in visual it gives a user the power to retrieve visual information by 

databases. Most of these efforts in visual databases, asking a query like "Give me all pictures that look like this." 
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The system satisfies the query by comparing the content of 
the query picture with that of all target pictures in the 
database. This is called Query By Pictorial Example 
(QBPE), and is a simple form of content-based retrieval, a 
new paradigm in database management systems. 5 

Over the last five years research and development in 
content-based retrieval of visual information has made sig- 
nificant progress. Academic research groups have developed 
techniques by which images and videos can be searched 
based on their color, texture, shape and motion characteris- 10 
tics. Commercial systems supporting this technology, such 
as Ultimedia Manager from IBM, and the Visual Intelligence 
Blade from Illustra Information Technologies, Inc. are 
beginning to emerge. 

A typical content-based retrieval system might be 15 
described as follows: image features are precomputed during 
an image insertion phase. These representations may include 
characteristics such as local intensity histograms, edge 
histograms, region-based moments, spectral characteristics, 
and so forth. These features are then stored in a database as 20 
structured data. A typical query involves finding the images 
which are "visually similar** to a given candidate image. In 
order to submit a query, a user presents (or constructs) a 
candidate image. This query image may already have fea- 
tures associated with it (i.e., an image which already exists 25 
within the database), or may be novel, in which case a 
characterization is performed "on the fly" to generate fea- 
tures. Once the query image has been characterized, the 
query executes by comparing the features of the candidate 
image against those of other images in the database. The 30 
result of each comparison is a scalar score which indicates 
the degree of similarity. This score is then used to rank order 
the results of the query. This process can be extremely fast 
because image features are pre-computed during the inser- 
tion phase, and distance functions have been designed to be 35 
extremely efficient at query time. There axe many variants on 
this general scheme, such as allowing the user to express 
queries directly at the feature level, combining images to 
form queries, querying over regions of interest, and so forth. 

General systems (using color, shape, etc.) are adequate for 40 
applications with a broad image domain, such as generic 
stock photography. In general, however, these systems are 
not applicable to specific, constrained domains. It is not 
expected, for example, that a texture similarity measure that 
works well for nature photography will work equally well 45 
for mammography. If mammogram databases need to be 
searched by image content, one would need to develop 
specific features and similarity measures. This implies that a 
viable content-based image retrieval system will have to 
provide a mechanism to define arbitrary image domains and 50 
allow a user to query on a user-defined schema of image 
features and similarity metrics. 

There is a need to provide a way to compare images 
represented by different schemas. There is also a need to 
reduce the time performing the comparison, especially when 55 
large numbers of images are in the database. 

SUMMARY OF THE INVENTION 

The above needs are satisfied by the present inven tion 
which is directed to a system and method for "conte nt- 60 
based" im ag e retrieva l, a technique which explicitly m an- 
ages^miage assets by dire ct ly representing their visu al 
alttibiil^s . a. visu al information retrieval (VIR) Engine pro- 
vides^ln open framework for building such a system. A 
visual feature is any pro perty of an image that can be 65 
computed using computer-vision ana image-processfbg 
techniques. Examples are hue, saturation, and intensity* 



histograms; texture measures such as edge density, 
randomness, periodicity, and orientation; shape measures 
such as algebraic moments, turning angle histograms, and 
elongatedness. Some of these features are computed 
globally, i.e., over an entire image, and some are local, i.e., 
computed over a small region in the image. The VIR Engine 
expresses visual features as image "primitives". Primitives 
can be very general (such as color, shape, or texture) or quite 
domain specific (face recognition, cancer cell detection, 
etc.). The basic philosophy underlying this architecture is a 
transformation from the data-rich representation of explicit 
image pixels to a compact, semantic-rich representation of 
visually salient characteristics. In practice, the design o f 
such primitives is no n -trivial, and is driven by a_number of 
contacting real- world constr aints ( e.g., computation tim e_ys. 
a ccuracy). The VIR Engine provides an open framework f or 
lievelopers to '^ p fl a^- i n " primitives to solve specific imag e 
management problems^ 

Various types of visual queries are supported by the VIR 
Engine as follows: 
Query by image property, wherein a user specifies a 
property or attribute ot the image, such as the arrange- 
ment of colors, or they may sketch an object and 
request the system to find images that contain similar 
properties. The Engine also allows the user to specify 
whether or not the location of the property in the image 
(e.g., blue at the bottom of the image or blue anywhere) 
is significant. 

Query by image similarity, wherein a user provides an 
entire image as a query target and the system finds 
images that are visually similar. 
Query refinement or systematic browsing. With any of the 
pr evious modes of query, the system produces^som e 
i nitial results. A browsing query is one that refines the 
query by either choosing an image from the previous 
result set, or by modifying the parameters of the 
original query in some way, The syste m in this situatio n 
reuses the previous results to generate refined results. 
An important concept in content -based retrieval is to 
determine how similar two pictures are to one another. The 
notion of similarity (versus exact matching as in database 
systems) is appropriate for visual information because mul- 
tiple pictures of the same scene will not necessarily "match," 
although they are identical in content. In the paradigm of 
content-based retrieval, pictures are not simply matched, but 
are ranked in order of their similarity to the query image. 
Another benefit is that content extraction results in very high 
information compression. The content of an image file may 
be expressed in as little as several hundred bytes of memory, 
regardless of the original image size. As an image is inserted 
into a VIMSYS database, the system extracts the content in 
terms of generic image properties such as its color, texture, 
shape and composition, and uses this information for all 
subsequent database operations. Except for display, the 
original image is not accessed. Naturally, the VIMSYS 
model also supports textual attributes as do all standard 

database s. — - ~ — — 

The VIR technology improves query success in many 
applications where images are collected, stored, retrieved, 
compared, distributed, or sold. Some applications for VIR 
technology include: manag ing digital images by stock photo 
agencies, photographers, ad agencies, publishers, libraries, 
and museums; ma naging digital video images f or production 
houses and stock-footage providers; visually screening or 
comparing digital images in medicine and health care; 
searching files of facial images for law enforcement, credit 
card, or banking applications; satellite imaging; manufac- 
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turing test and inspection; manufacturing defect classifica- embodiments of the present invention. However, the present 

tion; and browsing an electronic catalog for on-line shop- invention can be embodied in a multitude of different ways 

ping^ss^^ as defined and covered by the claims. In this description, 

jjoneaspect of the invention, there is a method of visual reference is made to the drawings wherein like parts are 

object comparison for a database of visual objects, compris- 5 designated with like numerals throughout, 

ing the steps of: a) applying primitives to a first visual object For convenience, the discussion of the preferred embodi- 

to extract a first feature vector, each primitive providing at ment fa organized into the following principal sections: 

least one primitive value to the first feature vector; b) Introduction and Model, Base VIR Engine and System, 

applying primitives to a second visual object to extract a Extensible VIR Engine and System, Applications, and 

second feature vector, each primitive providing at least one 10 Application Development, 
primitive value to the second feature vector; c) providing an 

ordering value for each primitive to order the primitives; d) I. INTRODUCTION AND MODEL 

comparing one of the primitive values from the first feature ^ Engme fe a Hbrary _ based too i that is delivered 

vector with the corresponding primitive value of the second ^ binafy form (an object Ubrary ^ faeader me ioterfaces) 

feature vector according to the ordering so as to obtam a 15 Qa vahous platforms? aad pr0 vides an American National 

primitive score; e) applying a primitive weight to the primi- Statldards institute (ANSI) "C" language interface to the 

live score to determine a weighted primitive score; f) appucation developer. It provides access to the technology 

summing the weighted primitive score mto a summed total G f v lS ual Information Retrieval (VIR), which allows images 

score; and g) repeating steps d-f until the summed total to be mathematicaUy characterized and compared to one 

score crosses a selected threshold. 20 another OQ tfae basis of t£visual similarity « Applications may 

BRIEF DESCRIPTION OF THE DRAWINGS now search for images or rank them based on "what they 

, ., f . - . , look like". The VIR Engine looks at the pixel data in the 

The present invention will be described in further detail aQd ana j lhe data ^ respect to 

with reference to the accompanying drawings, in which: attr ibutes such as color, texture, shape, and structure. These 

FIG. 1A is a block diagram of the modules of one 25 visuaJ atlr fo utes ^ called "primitives" , and the image 

embodiment of a visual information retrieval (VIR) system. characterization is built up from these. Images which have 

FIG. IB is a block diagram of a hardware configuration been analyzed may then be compared mathematically to 

for the VIR system of FIG. 1A. determine their similarity value or "score" . Images are 

FIG. 2 is an exemplary screen display seen while execut- analyzed once, and the primitive data is then used for fast 

ing the query canvas module 108 shown in FIG. 1A. v vv comparisons. 

FIG. 3 is an exemplary screen display sees during execu- / A first embodiment of the invention provides a " Base VIR 

tion of the alphanumeric query input module J.06, or sub- Engine API" which has a fixed set of visual primitives, and 

sequent to execution of the query canvas module 108 or tne necessary calls for analyzing and comparing images. A 

image browsing module 110 shown in FIG. 1A. 35 second embodiment of the invention provides an "Exten - 

FI G. 4 is an exe mplary screen display seen while execut- stole VIR Engine API" w hich allows application develop ers 

ing lhe thumbnail results browser 136 shown in FIG. 1A. the ability to create new visual primitives tor specialize d, 

FIGS. 5Aand5B are a high-level flow diagram showing vertical applications. 1 lus enables application developers^ 

the operation of the VIR system shown in FIG. 1A which capture digher level semantic information about the images 

includes the Base VIR Engine. 40 being analyzed, and create intelli gent applications i n specific 

FIG. 6 is a block diagram showing the components of the — ^ — . „ . , „ 

Extensible VIR Engine. The main functions of the Base Engme application pro- 

„ . ,1,1- r . w m - gramming interface (API) are: initialization and global 

FIG. 7 is a block diagram of an exemplary VIR system * efinitio * • anal is simUarit mmpaiisoa 

utilizing the Extensible VIR Engine of FIG. 6. functions, scoring functions, and weights management. In 

FIG. 8 is a high level flowchart of the operation of the addition to the functionality of the Base Engine, the Exten- 

Extensible VIR Engine shown in FIG. 6. sib]e Engine ^so has primitive registration and schema 

FIG. 9 is a flow diagram of portions of another embodi- management. The entry points for these functions are 

ment of a VIR system utilizing the Extensible VIR Engine defined in regular "C" header files. 

of FIG. 6. 5Q The VIR Engine has a "stateless" architecture in which all 

FIG. 10 is a flowchart of the run analyzer function 366 0 f the data about images is managed and stored by the 

shown in FIG. 8. application. Applications are responsible for passing "raw" 

FIG. 11 is a flowchart of the standard comparison function image data (e.g., red, green, blue (RGB) format buffers) into 

396 shown in FIG. 9. the engine, and then handling the feature data and scoring 

FIG. 12 is a flowchart of the threshold comparison func- 55 information that is returned to the application by the Engine, 

tion 398 shown in FIG. 9. When a comparison is desired, the application passes the 

FIG. 13 is a flowchart of a schema creation and primitive feature data for a P air of ima S es back t0 the En S ine t0 obtain 

registration function which is performed, in part, by the a final score. Thus, aU persistent data management, query set 

primitive registration interface 306 shown in FIG. 6. management, and similar activities, are the responsibility of 

FIG. 14 is a flowchart of a top "N" query function ™ ±e W*^™ developer. The Engine makes no assump- 

performed by either the Base VIR Engine of FIG. lAor the tlOQS about stora ^ methodologies formats, hst 

Extensible VIR Engine shown in FI(f 6. management, or any information structures that require state 

information. 

DETAILED DESCRIPTION OF THE Similarity scoring is a comparison of images based on a 

PREFERRED EMBODIMENT 65 conceptU al "feature space", where each image is a "point" in 

The following detailed description of the preferred this space. The similarity score is a number that represents 

embodiment presents a description of certain specific the abstract distance between two given images in this space. 



/ 
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Each visual primitive provides a component of the overall computed over smaller regions of the image. For each 
similarity score; that is, each primitive provides its own generic image property such as color, texture, and shape, a 
multi-dimensional feature space. An overall visual similarity number of primitives may be computed. Besides this con- 
score is provided by combining the primitive scores in a way ceptual definition of a primitive, the specific implementation 
that is visually meaningful. This is both application and user 5 may also be referred to as a primitive. For instance, the 
dependent; therefore the Engine allows the application to collection of functions to extract and compare an image 
pass in a set of weightings that define the "importance" of attribute may be referred to as a primitive, 
each primitive in computing the overall score. In the pres- Distance Metrics 

ently preferred embodiment, the scores are normalized in the Since primitives are extracted by different computational 

range [0 . . . 100]. io processes, they belong to different topological spaces, each 

The Virage Model of Visual Information having different distance metrics defined for them. 

Following the aforementioned VIMSYS data model for Computationally, these metrics are designed to be robust to 

visual information, Virage technology admits four layers of small perturbations in the input data. Because the abstracted 

information abstraction: the raw image (the Image Rep re- image primitives are defined in topological spaces, search- 

sentation Layer), the processed image (the Image Object is ing for similarity in any image property corresponds to 

Layer), the user's features of interest (called the Domain finding a (partial) rank order of distances between a query 

Object Layer) and the user's events of interest for videos or primitive and other primitives in that same space. Also, since 

other collections of sequenced images (the Domain Event the space of image properties is essentially 

Layer). The top three layers form the content of the image multidimensional, several different primitives are necessary 

or video. A discussion of representing the abstracted infor- 20 to express the content of an image. This implies that 

mation by data types follows. The data types pertain to the individual distance metrics need to be combined into a 

top three layers of the model. composite metric using a method of weighted contributions. 

Data Types Primitive Weighting 

A content-based information retrieval system creates an The overall similarity between two images lies literally 

abstraction of the raw information in the form of features, 25 "in the eye of the beholder/* In other words, the perceptual 

and then operates only at the level of the abstracted infor- distance between images is not computable in terms of 

mation. In general, data types and representation issues are topological metrics. The same user will also change his or 

only constrained by the language used for an implementa- her interpretation of similarity depending on the task at 

tion. hand. To express this subjective element, the VIR interface 

One presently preferred implementation is as follows. For 30 provides functions to allow the user to control which relative 

visual information, features may belong to five abstract data combinations of individual distances satisfies his or her 

types: values, distributions, indexed values, indexed needs. As the user changes the relative importance of 

distributions, and graphs. A value is, in the general case, a set primitives by adjusting a set of weighting factors (at query 

of vectors that may represent some global property of the time), the VIR system incorporates the weight values into 

image. The global color of an image, for example, can be a 35 the similarity computation between feature vectors, 

vector of RGB values, while the dominant colors of an The information model described above is central to the 

image can be defined as the set of k most frequent RGB system architecture. All other aspects such as the keywords 

vectors in an image. A distribution, such as a color histogram associated with images, the exact nature of data management 

is typically defined on an n-dimensional space which has and so forth are somewhat secondary and depend on the 

been partitioned into b buckets. Thus, it is a b-dimensional 40 application environments in which the technology is used, 

vector. An indexed value is a value local to a region of an The software aspects of this core technology are explained 

image or a time point in a video or both; as a data type it is hereinbelow. An explanation of the different environments 

an indexed set of vectors. The index can be one-dimensional in which the core model is embedded also follows, 

as in the key-frame number for a video, or it can be „ _ m „ A , „„ «*„,,» m * * m 

multi-dimensional as in the orthonormal bounding box coor- 45 ILTWEJJASE VIR ENGINE AND SYSTEM 

dinates covering an image segment. An indexed distribution The VIR system technology is built around a core module 

is a local pattern such as the intensity profile of a region of called the VIR Engine and operates at the Image Object 

interest, and can be derived from a collection of Level of the VIMSYS model. There are three main func- 

b-dimensional vectors by introducing an index. A graph tional parts of the Engine: Image Analysis, Image 

represents relational information, such as the relative spatial 50 Comparison, and Management. These are invoked by an 

position of two regions of interest in an image. We do not application developer. Typically, an application developer 

consider a graph as a primary type of interest, because it can accesses them during image insertion, image query, and 

be implemented in terms of the other four data types, with image requery (a query with the same image but with a 

some application-dependent rules of interpretation (e.g. different set of weighting factors). The function of each unit, 

transitivity of spatial predicates, such as left-of). 55 and how the application developer uses the VIR Application 

It follows from the foregoing discussion that vectors form Programming Interface (API) to exchange information with 

a uniform base type for features representing image content. the VIR Engine is described below. The full capabilities of 

In a presently preferred embodiment, the primary data type the Engine are decomposed into two API sets: a Base VIR 

in the VIR Engine is a (indexable) collection of feature Engine, and an Extensible VIR Engine. The Base Engine 

vectors (FVs). 60 provides a fixed set of primitives (color, texture, structure, 

Primitives etc.) while the Extensible Engine provides a set of mecha- 

Image objects have computable image properties or nisms for defining and installing new primitives (discussed 

attributes that can be localized in the spatial domain in detail later), 

(arrangement of color), the frequency domain (sharp edge Bas e System Modules 

fragments), or by statistical methods (random texture). 65 Referring to FIG. 1A, the modules of an embodiment of 

These computed features are called primitives. Primitives a visual information retrieval (VIR) system 100 that utilizes 

are either global, computed over an entire image, or local, the Base VIR Engine 120 will be described. A user 102 
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com m unicates with the system 100 by use of comput er 
input/output 104. The computer I/O 104 will be further 
described in conjunction with FIG. IB. The user 102 initiates 
one of several modules or functions 106-114 that output to 
either the VIR Engine 120 or a database engine 130. The 
database engine 130 can be one of the many commercially 
available database engines available on the market, such as 
available from Informix Software, Inc., or IBM DB2. 

An "Alpha-numeric query input" module 10 6 allows th e 
user to specify a target object by alpha numeric attributes, 
such as shown in an exemplar y Query Window screen o f 
Fia3.T he output ot thr£ module bypasses the VIR Engine 
"fZtTand is used as a direct input to the database engine 130. 

A "Query Canvas" module 108 provides a visual query 
input to the VIR Engine 120. The Query Canvas module 108 
will be further described in conjunction with FIG. 2. 

An "Image Browsing" module 110 provides a visu al 
i nput, such as an image trom a file or database accessible to 
th e user 102. The file or database may be on the user's 
computer, such as on a hard drive, CD-ROM, digital video/ 
versatile disk (DVD) drive, tape cartridge, ZIP media, or 
other backup media, or accessible through a network, such 
as a local area network (LAN), a wide area network (WAN) 
or the Internet. The visual input is provided to the VIR 
Engine 120. An "Insertion" module 112 is used to prov ide 
one or moreTh ew ima ges to be added to a database 13 2 
accessible by the data base"^ ngine 130. The new image(s) are 
provided as inputs lo the VIR Engine 120. Note that refer- 
ences to the database 132 may be to a portion or a partition 
of the entire database, such as, for example visual objects 
associated with a particular domain. Therefore, visua l 
objects for multiple domains or subsets of a domain could be 
s tored in separated ?, ^b a< ^ g nr thfi y ma y_be stored int one 
d atabase . 

An "Other Database Management" module 114 is used to 
initiate standard database operations on database 132. Mod- 
ule 114 communicates directiy with the database engine 130. 

The VIR Engine 120 comprises two main modules: an 
"Image Analysis" module 122 and an "Image Comparison" 
module 124. The image analysis module 122 receives inputs 
from either module 108 or 110 to generate a query target or 
from the insertion module 112 for adding a new image into 
the database 132. The output of the image analysis module 
122 is a feature vector (FV) that describes the visual object 
passed to it by one of modules 108, 110 or 112. The FV is 
passed on to the database engine 130. In addition, if module 
112 was used to insert the image into the database, both the 
FV for the image and the image itself are stored in the 
database 132 (as seen in FIG. SB). The image analysis 
module 122 will be described in greater detail hereinbelow. 

The image comparison module 124 receives a query 
target FV and a FV for the image being tested or compared 
from the database engine 130. The output of the image 
comparison module 124 is a similarity score that is sent to 
a "Ranked List Management" module 134. A plurality of 
images from the database 132 are compared one at a time to 
the query image by the image comparison module 124. The 
resultant similarity scores are accumulated by the module 
134 so as to provide a rank in an order of their similarity to 
the query image. The ranked results of the list management 
module 134 are provided to a "Thumbnail Results Browser" 
136 for display to the user 102 through the computer I/O 
104. An exemplary screen display of ranked results is shown 
in FIG. 4. 

Referring now to FIG. IB, a hardware configuration for 
the VIR system of FIG. 1 A will be described. A computer or 
workstation 140a communicates with a server 160 by a 
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network 162, such as a local area network (LAN) or wide 
area network (WAN). One or more additional computers or 
workstations 1406 can be connected to the server 160 by the 
network 162. The computers 140a and 1406 can be a 

5 personal computer, such as utilizing an Intel microprocessor 
chip (at minimum, a 80486 model) or a Motorola PowerPC 
chip, or a workstation utilizing a DEC Alpha chip, a SPARC 
chip, a MIPS chip, or other similar processor 144. A com- 
puter enclosure 142 contains the processor 144, a storage 

10 device 146 connected to the processor 134 preferably of at 
least 1-2 Gigabytes, and a memory of at least 32 Megabytes 
(not shown). Connected to the processor 144 are a plurality 
of I/O devices 104 (FIG. 1A) including a visual monitor 148, 
a printer 150, a pointing device (such as a mouse, trackball 

15 or joystick) 152, and a keyboard 154. Optional I/O devices 
include a scanner 154 and a backup unit 158. The server 160 
typically has similar or greater processing power than the 
computers 140a and 1406 but typically has a larger capacity 
storage device and memory. The server 160 also has a 

20 backup facility to safeguard the programs and data. The 
server 160 may be connected to remote computers similar to 
computer 140« by a modem 164 to another network 166, 
which may be a WAN or the Internet for example. 

The present invention is not limited to a particular com- 

25 puter configuration. The hardware configuration described 
above is one of many possible configurations. Other types of 
computers, server and networks may be utilized. 

In one embodiment of the system 100, the modules shown 
in FIG. 1A may all be physically located on one computer 

30 140*2. In another embodiment of system 100, the computer 
I/O 104, and modules 106-114 and 134-136 could be 
located on computer 140a, white the VIR Engine 120, the 
database engine 130 and the database store 132 could all be 
located on the server 160. In yet another embodiment of 

35 system 100 that is similar to the previous embodiment, the 
VIR Engine 120 could be on server 160 and the database 
engine 130 and the database store 132 could be located on 
another server (not shown) on the network 160. Other 
combinations of the above modules are also possible in yet 

40 other embodiments of the system 100. Furthermore, indi- 
vidual modules may be partitioned across computing 
devices. 
Query Canvas 

Referring to FIG. 2, an exemplary screen display 180 of 

45 the Query Canvas module 108 will be described. The Query 
Canvas is a specific user-interface mechanism that is an 
enhancement to the query specification environment. The 
Query Canvas provides a bitmap editor to express the query 
visually, and serves as an input to the Image Analysis 

50 module 122 (FIG. 1A). The canvas may begin as a blank 
slate in a canvas window 181, or may have an existing image 
pre-loaded into it (drag and drop an image from an existing 
image collection) prior to modification with a set of 
painting/drawing tools. These tools include, for example, 

55 standard brushes 184, pens, region fills, a magic wand to 
define regions, ovals 186, rectangles, lines, and so forth. A 
color palette 188 is provided, with the ability to define new 
colors from a color chooser. A palette of textures 190 is also 
provided, with the ability to select new textures from a large 

60 library. 

Once an image, such as image 182, has been created, it 
can be submitted as a query to the system. The Query 
Canvas tool saves the user significant initial browsing time 
in those cases where he or she already has an idea of what 
65 the target images should look like. Since the query canvas 
allows modification of images, it encompasses the function- 
ality of the "query-by-sketch" paradigm. 
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Of course, one will recognize that the present invention is Global Color (252): considers both the dominant color 

not limited to any particular type of query creation. and the variation of color throughout the entire image. 

Query Window Structure (254): determines large scale structure in the 

Referring to FIG. 3, an exemplary screen display 200 of image as represented mainly by edges with strong 

a Query Window will be described. The Query Window or 5 matching for the location and orientation of edge 

form 200 is provided to specify alphanumeric information features 

201 such as keywords, dates file name masks, Project °r ^ aQ ^ for riodid randomness> 

client names, and so forth^ The Query Window 200 also and ro V hness (srn0 othness) of fine-grained textures in 
shows an iconic image 202 ot the current contents or the 

Query Canvas 108 (FIG. 1A) which expresses the visual io imag^Aratygis 

component of the query. Returning now to the analysis module 122, the analysis 

However, the most important aspect of the Query Window m ocessi operations, such as 

2W are the sliders (such as slider 208) to control the relative smoolhin ^ and contrast enhaDce ment ( to make the image 

importance oi ■we.ghts 204 ifor the vmial and textual aspects fead fo * primitive . extrac , ion routines . Each 

of the .query. There are sliders to .indicate the irnportance of 15 prim < dve . extracUon rouUne takes a p te p r ocessed image, 

y^ual query attributes such as Color, Texture 206, Shape, dependiog on the properties of the image, computes a 

Location, and textual query attributes such as Keywords. dfic f ca , ]ed fcature d fof ^ 

The ability to select perceptual we.gr.ts of attributes is a Feature ^ ^ data ^ ica „ eseQts ^ { 

critical aspect of the visual query over which the user has feature ^ ^ 0Qe ^ ^ feature data 

control. Of course, other attributes and ways selecting 2 0 ^ ical , ± , matheimlical characterization of the visual 

weights are encompassed by the present invention. feature. A feature vector is a concatenation of a set of feature 

m? R ssults data elements corresponding to a set of primitives in a 

^Relelimg lOFia4 an exemplary screen disp ay 220 of (fimher hereinbelow) . ^ feamre vector 

Query Results wdl be descnbed. The Query Results 220 are ferab i nas header ^formation that maps the feature data 

displayed to the user iw by the th umbnail results^ browser 25 contained within it. 

T36(HG^A). A thumDnan ueaucea size; .mage .ffi^i the whcn ^ ^ ^ module n2 ^ , o ^ ; 

quelTffi^ preferably shown in the upper left comer of m(o m( . 132 me featurc vectof of (he ut6d 

the visual display 148 (FIG. IB). A thumbnail 224 of the ^ ^ ston;d in a ^ stmctuK 2M , n 

unage that has the best similarity score, indicative of the ^ lication a raw image buffer to the VIR 

closest match to the query image, is shown to the rr&ht of the 30 £n ^ ^ the E me retums a mter to a ^ of ^ 

query image 222. A thumbnail 226 of the image having the containi ^ extracted data ^ applicatiorj ^ 

second best similarity score is shown to the nght of image ^ oasible for storin and managiQg the data m a 

224, and so forth for a predetermined number of thumbnail isl ent fashion _ ^ vtr Engme uo operates in a « state . 

images shown to the user 102^ A mechanism (not shown)to , ess „ fashion which mea[)s u faas nQ of now the 

access a next screen of ranked thumbnaus is available. The 3s ^ daU fc anized ^ stored> or how the results of 

similarity score of each of the ranked images may be ties m m a ed fc no ^^0,, managcm cnt at 

optionally shown in conjunction with the thumbnaus. Of me E me ^ leve , ^ means ma( 

course, the present invention is not limited to the particular j 1 j - . j. u • a- . 

' . f K 1 developers and integrators need not worry about conflicts 

presentation ot search results^ between the VIR Engine and other application components 

Operational Flow of Base VIR System 40 such as databaseSj akaA^^ middleware, and so forth. 

Referring to FIGS. 5A and 5B a high-level flow diagram Proceeding lo stat e 26 0 of FIG. 5A, the feature vector of 

showing the operation 240 of the VIR system 100, including m£ fc submitted to , Q Processor 2 61 (FIG. 

the Base VIR Engine 120, will be described. Theuserjp^ 5B) ^ Processor 261 obtains a candidate feature 

(FIG. 1A) preferably initiates query generation 24TFy either vector for an image "i" from feature vector storage 264 (part 

uHEzuig the query canvas 108 to create a query, or browses 45 of da(abase m) The feature vector of , he „ , 

110 the available file system to locate an existing object to (F v* ln d the candidate feature vector (FV,) are then 

useastheque^orbrowses246thedatabasestore 132(FIG. ^^Ytted to the comparison module 124. 

1 A and FIG. 5B) to identify an ima ge that has already, be en Comparisons 

analyzed by the analysis module 122. In the last situationjf ^ afe [o ^ ^ u& . ^ ^ 

tne image is aireaoy in the database 132, a teature vectoTha s 50 ^ method mvolves tin one Qr more simi]arit 

beeu LULUpUled auJ is retrieved at slate 247 from a feature djstances for a ajr rf ^ {|jve vecto(s ^ ^ of 

vector storage portion Z64 ot the database 132. Atarget me simi , ari ^ i(1 ^ ^ First for 

""^ Ir 248 results it either ot the query canvas module 108 each rimitive s(Jch ag loca , color 2?0 ^ obal co , or 272 

or browse file system module 110 are used to generate a slruc(ure 2?4 Qr texmre 276> a similari disla(Jce (score) fa 

query. The target image 248 is input to the analysis module 55 computed . similarity scores for primitives are further dis- 

122 to generate a feature vector for the target image as the cussed Jn conjunction with nG n ^ ( .) ^ 

output. Because of the .mportance of the pnm.t.ves in the ^ ^ ^ m ^ weights ( } ^ b a 

system 100, a d.gress.on is now made to describe the base judiciously chosen mnctioD ^ forms , final score j,,, 

?^ s ] em P^ 1 ™ 1 ! 1 ^ 5 - final combined score may, for instance, be generated by a 

Delault Primitives 60 linear combination or a weighted sum as follows: 

The Base VIR Engine 120 has a fixed or default set of & 
primitives. Primitives and their weights are identified and 

indicated using a tagging mechanism to identify them in the 5 1 = Zj WiSi 

API calls. The default primitives of the presently preferred ' 

Base Engine are: 65 

Local Color (250): analyzes localized color and the spatial The final score is used to rank results 286 at state 284 by 

match-up of color between two images. similarity. An image 288 with the best score (the lowest 
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score in the presently preferred embodiment) is ranked at the 
closest match. Of course, the definition of "similarity" at this 
point is determined by the set of weights 282 used. 

Applications may also synthesize a property weighting 
(such as "composition") by intelligently applying weights 5 
during comparisons. If "composition" is weighted low, then 
global primitives should be emphasized; if it is weighted 
high, then local primitives should be emphasized. 

Decision state 290 determines if there are more images in 
the database 132 that need to be evaluated by the comparison to 
module 124. If so, the Query Processor continues at state 
262 by obtaining the next candidate feature vector. If all the 
candidate images in the database 132 have been evaluated, 
processing advances to state 292 wherein the thumbnails 
corresponding to a predetermined number of ranked thumb- 15 
nails are retrieved from the image storage portion 266 of 
database 132 and are displayed to the user at state 294. 
Management 

There are several supporting functions that fall in the 
category of "management." These include initialization, 20 
allocation and de-allocation of weights and scores 
structures, and management of primitive vector data. 

III. THE EXTENSIBLE VI R ENGINE AND 

SYSTEM 25 

The Extensible VIR Engine introduces the notion of a 
"schema". A schema is a specific collection of primitives 
(default and/or application-specific) which are used in an 
application for the purpose of comparing images. When a 
group of primitives are registered, the system returns a 30 
schema ID to be used for future reference when creating 
weights and scores structures. 

The Extensible VIR Engine is an open, portable and 
extensible architecture to incorporate any domain specific 35 
information schema. The Extensible VIR Engine architec- 
ture can be extended not only across application domains, 
but across multiple media such as audio, video, and multi- 
dimensional information. 

The purpose of the Extensible Engine is to provide to the ^ 
application developer the flexibility of creating and adding 
custom-made primitives to the system. For example, a 
face-matching system might construct primitives called 
"LeftEye" and "RightEye", and provide an interface that 
compares faces based on the similarity of their eyes. 45 
Developer-Defined Primitives 

In terms of the VIR Engine, a collection of vectors 
representing a single category of image information is a 
primitive. A primitive is a semantically meaningful feature 
of an image. Thus color, texture, and shape are all general 50 
image primitives. Of course, not all primitives will be 
applicable across all images. For instance, a color primitive 
may have no relevance with respect to X-ray imagery. Id 
practice, a primitive is specified by a developer as a 6-tuple 
of the following values: 55 
Static information 

primitive_id — a unique primitive identifier 
label — a category name for the primitive 
Data retrieval functions 

analysis__function — This function essentially accepts 60 
the image data and computes its visual feature data 
and stores it in a buffer. The function must accept an 
RGB image buffer, its attributes (height, width) and 
based on this information, perform any desired com- 
putation on the pixel data in the buffer. The results of 65 
this computation (i.e., feature computation) can be 
anything. The primitive decides what it wants to 
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return as the feature data. The feature data is returned 
by passing back a pointer to the data and a byte count 
telling the VIR Engine how much data is there. The 
Engine then takes the data and adds it to the vector 
being constructed. 
compare_function — This function returns the similar- 
ity score for its associated primitive. The query 
operations of the engine call this function with two 
data buffers (previously created with analysis__ 
function ( )) to be compared. The score which is 
returned is preferably in the range from [0.0 . . . 
100.0], wherein a "perfect" match returns a value of 
zero and a "worst" match returns a value of 100. The 
score is best considered to be a "distance" in "feature 
space". For maximum discrimination, the spectrum 
of distances returned for this primitive should be 
spread over this range evenly or in a reasonably 
smooth distribution. 
Data management functions 
swap_function — The engine takes full responsibility 
for handling the byte order difference between hard- 
ware platforms for easy portability. This allows data 
that is computed on a certain platform to be easily 
used on any other platform, regardless of byte-order 
differences. Each primitive supplies this function 
which will do the byte-order conversions of its own 
data. The engine will automatically use this function 
when necessary, to provide consistent performance 
across any platform. 
print_function — This function is used to print out the 
desired information of the associated primitive. 
After a primitive is defined, it is registered with the 
Extensible VIR Engine using the RegisterPrimitive( ) func- 
tion. Once registered, data associated with a custom primi- 
tive is managed in the visual feature structures in the same 
manner as the default primitives. From there, the new 
primitive can be incorporated into any schema definition by 
referencing the primitive_^ just like a built-in (default) 
primitive. Application developers may define any type of 
data structure^) to handle the data associated with their 
primitive. It is preferably required that the structure(s) can 
collapse into a BLOB to be passed back and forth via the 
registered procedures. In addition to the above primitive 
information, an estimated cost of comparison may also be 
supplied for the primitive, to aid in query optimization 
performed by the engine. 

In another implementation of the present inventive exten- 
sible search engine, a pri mitive may be defined in an 
object-on e nted language such as, tor example, C++. In an 
oDject-onented language, an object is defined to include data 
and methods for operating on the data. One text for C++ 
programming, C++ Primer by Stanley Lippman, Second 
Edition, Addison-Wesley, is incorporated herein by refer- 
ence. 

Objects are created from classes defined by the author of 
an API. The base class may then be subclassed to provide a 
specific primitive, a color primitive for instance. The API 
author will then overload, say, a compare function and an 
analysis function. Thus, an extended primitive is added to 
the engine by object-oriented subclassing and function (or 
method) overloading. Such an embodiment will be under- 
stood by one of skill in the relevant field of technology. 

More specifically, abstract C++ classes using pure, virtual 
functions may define the interface. Furthermore, the object- 
oriented system implementation could follow the Object 
Management Group (OMG) standards. Presently, OMG is 
working on an Object Query Service standard which is 
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defined by Object Services Architecture (Revision 6.0), that are controlled independently from the weights interface 

which is incorporated by reference. Further information on of the VIR Engine. There is also ample opportunity for a set 

object-oriented database standards can be found in The of domain primitives to cooperate through shared data 

Object Database Standard: ODMG 93, edited by Cattell, structures and procedures (or objects) in such a way that they 

Morgan Kaufman Publishers, which is incorporated herein 5 can economize certain computations and information, 

by reference. The primitives include a mechanism called "primitive 

Schema Definition extensions 5 ^ enriching the API. This allows the applica- 

Databases require a consistent structure termed a schema, doa tef control Qver the beha ^To t t he primi uvetand 

to organize and manage the information. As used herein, in the result s ot comparisons. For example, a texture prim i tive 

particular, a schema is a specific collection of primitives. A „ . . . c r* ; — ! tt^ . 

r . c V r c - ir ^ .1.10 mav expose a set of weights for su b-components of texture 

specific schema implies a specific set or visual features to be — £ - — : . — — , r- — - — -a r-r_- 

processed and a corresponding feature vector to be used for !" ch 35 1*™*** randomness, roughness and orientation 

content-based similarity scoring. A VIR Engine schema is V™**** would be specialized and independent of 

defined as a 2-tuple: a schema id, and an ordered set of me main texture weight passed through the Compare module 

primitives. Similar to primitives, the Extensible VIR Engine e n t ry points. 

is notified of a new schema by a RegisterSchemaf) function. 15 Universal PlimiUves 

The primitive IDs referenced here must have previously S*«ral "^versal or default primitives are included 

been defined using RegisterPrimitive( ), or must be one of Y 1 " 1 the B ? * ™ Engine. These primitives are universal in 

the default primitives. The order in which the primitives are me XDS6 the / f ncode wh ' ch ? resent ln mosl 

referenced dictates the order in which their functions are mages, and useful in a wide class of domain-independent 

called during feature extraction (but not during query 20 aPP"cations. Each of these primitives are computed using 

processing). This allows primitives to work synergistically oa, y tne on g lnal . of lhe una S e - V*? K 15 1,0 . manua ' 

and share computational results. A single application is "teryention required to compute any of these primitives. A 

allowed to define and use multiple schemas. The Extensible Eloper can choose l0 mcc-and-match these pnmihves m 

VIR Engine operates as a stateless machine and therefore conjunction with domain specific pnm.tives to build an 

does not manage the data. Hence the calling application 25 aPP^aUoa These primitives have been designed based on 

manages the storage and access of the primitive data com- me abovc heuristics. 

puted from any schema. The application developer must Global color— This primitive represents the distribution 

manage the schema_id that is returned from the registration. of colors within the entire image. This distribution also 

Preferably, the schema itself is expressed as a NULL- includes the amounts of each color in the image, 

terminated array of unsigned 32-bit integers, each contain- 30 However, there is no information representing the 

ing the ID of the desired primitive. The primitive IDs locations of the colors within the image, 

referenced here must have previously been defined using Local color — This primitive also represents the colors 

RegisterPrimitive, or must be one of the default primitives. which are present in the image, but unlike Global color, 

Primitive Design it emphasizes where in the image the colors exist. 

The "pistons" of the VIR Engine are the primitives. A 35 Structure— This primitive is used to capture the shapes 

primitive encompasses a given feature's representation, which appear in the image. Because of problems such 

extraction, and comparison function. There are a number of ^ lighting effects and occlusion, it relies heavily on 

heuristics which lead to effective primitive design. These sna p e characterization techniques, rather than local 

design constraints are not hard rules imposed by the Engine snape segmentation methods. 

architecrure, but rather goals that lead to primitive which are Texture-Tnis primitive represents the low level textures 

well-behaved . For a given application, an engineer may md m me • ^ structure 

choose to intentionally relax certain constraints in order to pruI1 itive, it is very sensitive to high-frequency features 

best accommodate the tradeons associated with that domain. within the imaee 

The constraints are as follows: ^ Domain Specific priimtives 

meaningful— Primitives should encode information Applications with relatively narrow image domains can 

which will be meaningful to the end-users of the fegister domain specific primitives t0 i mpr ove the retrieval 

system. Primitives, in general, map to cognitively rel- capability of the system. For applications such as retinal 

evant image properties of the given domain. imaging, satellite imaging, wafer inspection, etc., the devel- 

compact — A primitive should be represented with the 5Q opme nt of primitives that encode significant domain knowl- 

minimal amount of storage. c d ge resu ] t i n powerful systems. Primitives should obey 

efficient in computation — Feature extraction should not the design constraints listed above, but there is considerable 

require an unreasonable amount of time or resources. flexibility in this. For example, a wafer inspection primitive 

efficient in comparison — Comparison of features should may be designed to look for a specific type of defect. Instead 

be extremely efficient. The formulation should take 5S of an actual distance being returned from the distance 

advantage of a threshold parameter (when available), function, it can return 0.0 if it detects the defect, and 100.0 

and avoid extraneous processing once this threshold if not. 

has been exceeded. The distance function should return Analysis 

results with a meaningfully dynamic range. Before an application can determine the similarity 

accurate — The computed data and the associated similar- 60 between an image description and a set of candidate images, 

ity metric must give reasonable and expected results for the images must be analyzed by the engine. The resulting 

comparisons. feature data is returned to the caller to be used in subsequent 

indexable— The primitive should be indexable. A second- operations. Naturally, if an image is to be a candidate image 

ary data structure should be able to use some associated ^ future operations, the feature vector should be stored in a 

value(s) for efficient access to the desired data. 65 persistent manner, to avoid re-analyzing the image. 

In addition, primitives can provide their own "back door" analyze_image — This function accepts a memory buffer 

API's to the application developer, and expose parameters containing the original image data. It performs an 
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analysis on the image by invoking the analysis func- 
tions of each primitive. The results of this computation 
are placed in memory and returned to the caller, along 
with the size of the data. Maintenance and persistent 
storage of this data is the caller's responsibility. 5 
Eventually, these structures arc passed into the image 
comparison entry points. 
destroy_features — This function is used to free the 
memory associated with a visual feature that was 
previously returned from analyze__image( ). Typically, 10 
this is called after the application has stored the data 
using the associated persistent storage mechanism. 
Similarity/Scores 

Any image retrieval application requires the ability to 
determine the similarity between the query description and 15 
any of the candidate images. The application can then 
display the computed similarity value of all of the candidate 
images, or convey only the most similar images to the user. 
To do this, similarity scores are computed by the engine for 
the relevant candidate images. An application will call the 20 
comparison functions provided by the engine. These func- 
tions will return a score structure, which indicates the 
similarity between the images being compared. The score 
structure contains an overall numerical value for the simi- 
larity of the two images, as well as a numerical value for 25 
each of the primitives in the current schema. This allows 
applications to use the values of the individual primitive 
comparisons, if necessary. 

When two images are compared by the engine, each of the 
primitives in the current schema are compared to give 30 
individual similarity values for that primitive type. Each of 
these scores must then be used to provide an overall score 
for the comparison. In certain situations, these individual 
primitive scores may need to be combined differently, 
depending on the desired results. By altering the ways these 35 
individual scores are combined, the application developer 
has the ability to indicate relative importance between the 
various primitives. For example, at times the color distri- 
bution of an image will be much more important than its 
texture characteristics. There may also be cases where only 40 
some of the available primitives are required in order to 
determine which images should be considered the most 
similar. 
Weights 

Applications are given flexibility in how the overall score 45 
is computed through use of a weights structure. The weights 
structure includes a weight for each primitive. The applica- 
tion has control over the weight values for any given 
comparison through the weights structure, and the following 
functions. 50 
create_weights — This function is used to allocate a 
weights structure for use in the compare functions. The 
associated schema_id will determine the specific for- 
mat of the structure, 
destroy-weights — This function is used to free the 
memory previously allocated with create_weights( ). 
set_weight — This function sets the weight in the weights 
structure identified by the given primitive_id, which 
identifies the primitive whose weight is to be set. The 60 
value should be a positive floating point number. In 
general, weights are normalized before use by calling 
norma lize__weights( ). 
get_weights — This function is used to extract an indi- 
vidual weight value from a weights structure. 65 
Note that other interesting visual parameters may be sur- 
faced in a user interface by combining the weights of the 
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primitives in intelligent ways. For example, a visual quantity 
called "Composition" may be synthesized by controlling the 
relative weighting of the color primitives. 

Two examples of utilizing weights with the primitives by 
use of the weights sliders (e.g., 208) in the query window 
200 (FIG. 3) are as follows: 

Texture: The VIR Engine evaluates pattern variations 
within narrow sample regions to determine a texture 
value. It evaluates granularity, roughness, 
repetitiveness, and so on. Pictures with strong textural 
attributes — a sandstone background for example — tend 
to be hard to catalog with keywords. A visual search is 
the best way to locate images of these types. For best 
results, a user should set Texture high when the query 
image is a rough or grainy background image and low 
if the query image has a central subject in sharp focus 
or can be classified as animation or clip-art. 
Structure: The VIR Engine evaluates the boundary char- 
acteristics of distinct shapes to determine a structure 
value. It evaluates information from both organic 
(photographic) and vector sources (animation and clip 
art) and can extrapolate shapes partially obscured. 
Polka dots, for example, have a strong structural ele- 
ment. For best results, a user should set Structure high 
when the objects in the query image have clearly 
defined edges and low if the query image contains 
fuzzy shapes that gradually blend from one to another. 
Comparison 

To get the result of an image comparison, the application 
supplies the precomputed primitive vectors from two 
images, together with a set of weights to a first API called 
Compare. The system fills in a score data structure and 
returns a pointer to the caller. A second API called Compa- 
re IntoScores caches the primitive component scores for later 
use. A function RefreshScores can efficiently recompute a 
new score for a different set of weights (but the same query 
image, i.e., a re-query). This second API call takes a score 
structure and a weights structure, and recomputes a final 
score (ranking) without needing to recompute the individual 
primitive similarities. A third API call (Threshold Com pa re) 
is an extension of the first, in that the user also supplies a 
threshold value for the score. Any image having a distance 
greater than this value is considered non -qualifying, which 
can result in significant performance gains since it will 
probably not be necessary to compute similarity for all 
primitives. 

Every application may have unique requirements in the 
way the application determines which images are to be 
considered most similar, and how to efficiently manage a 
changing set of results. Certain applications may need to do 
an exhaustive comparison of all images in the candidate set 
while others are only "interested" in a certain set which are 
most similar to the query description. Certain applications 
(or situations) may also require the ability to quickly 
manipulate the relative importance of the primitives, using 
the individual primitive scores and weights, as discussed 
above. In another embodiment of the present engine, com- 
parison functions may be structured as follows: 

compare — This is the simplest entry point for computing 
the overall visual similarity for two given images, 
represented by their respective visual features. The 
caller passes in a weights structure and two feature 
vectors, and compare( ) computes and returns the 
weighted overall score, which is a numerical value 
preferably in the range [0.0 . . . 100.0]. This function 
can be used when a score is required for every candi- 
date image. If only the top N scores are required, the 
function threshold__compare( ) may be more appropri- 
ate. 
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heterogeneous__compare — This is a variation of the stan- 
dard compare described above, wherein the schemas 
for each of the two images have the same primitives. In 
the heterogeneous compare, each of the two images 
may have been analyzed by use of a different schema. 5 
For example, a feature vector for image A is based on 
a different set of primitives than a feature vector for 
image B. 

threshold__compare — This function can be used for opti- 
mized searches in which the scores of every single 1Q 
candidate image are not required. A threshold similarity 
distance is passed in to indicate that any image whose 
score is above the threshold is not of interest for this 
search. As soon as the engine determines that the image 
is outside this range, it terminates the similarity com- ^ 
putation and returns a flag to indicate that the threshold 
has been exceeded. This provides a significant perfor- 
mance boost when top N style searches are sufficient. 
Top N queries will be described in conjunction with 
FIG. 14. Again, it is the application's responsibility to 
determine the appropriate threshold value for each 
comparison. 
Query Optimization 

A final aspect of the Extensible Engine is the notion of 
query optimization. Each primitive provides a similarity 
function to the Engine. During the "threshold compare" 
operation, the Engine attempts to visit the primitives in an 
order such that it can determine as cheaply as possible if the 
comparison score will exceed the passed-in threshold. As 
soon as it is exceeded, the rest of the primitive comparisons ^ Q 
are aborted. Two main factors play into the query optimi- 
zation scheme: the weighting associated with that primitive, 
and the cost of executing the comparison operation for that 
primitive. Application developers can tell the Engine what 
the cost of their primitive's similarity function is during the ^ 
registration process. Developers that construct their own 
primitives can help the optimizer by providing accurate cost 
information for their custom Compare function. The follow- 
ing description explains how to determine the cost of the 
custom Compare function for the new primitive. ^ 

The cost value is a positive number which cannot be 0.0. 
If the application uses all custom primitives, then the actual 
values of these costs are not important. They should merely 
be relatively correct. Values of 1.0, 2.0, and 3.0 are the same 
as 100, 200, 300. However, if the application developer ^ 
wishes to integrate some custom primitives with the default 
primitives previously described, then the cost values must be 
calibrated with respect to the cost values for the default 
primitives. 

In one presently preferred embodiment, the nominal base- 
line for computation cost may be arbitrarily set by defining 
that the VIR_GLOBAL_COLOR primitive has a cost of 
1.0. On this scale, the default primitives have the following 
costs: 



Global Color 1.00 

Local Color 2.20 

Texture 4.10 

Structure 2.30 



To calibrate a custom primitive against this cost scale, 
some empirical experiments must be performed and the 
execution of the new procedures timed relative to the time 
taken by the Global Color primitive. This ratio is the cost 
value that should be passed to the primitive registration 65 
procedure. A skeleton benchmark application is provided as 
an example with the Extensible Engine API that can be used 
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to help develop new primitives and assess their cost. It 
constructs a schema with only the Global Color primitive as 
a timing baseline. The application developer then can con- 
struct a schema with only the new primitive to establish its 
cost relative to the Global Color primitive. 

If the cost value for a new primitive is unknown, or if its 
execution time varies widely depending on the image that is 
being analyzed, then it is best to estimate the cost, or use the 
value 1.0. 

Flowchart and Architecture Descriptions 

Referring to FIG. 6 the components of the extensible VIR 
engine will be described. As previously described above, the 
components are part of the "C" API. Of course, other 
computer languages can be used for the API. The extensible 
VIR engine 300 includes three main components; am ana- 
lyzer 302, a comparator 304 and a primitive registration 
interface 306. The analyzer 302 is similar to the analysis 
module 122 and the comparator 304 is similar to the image 
comparison module 124, previously shown in FIG. 1A. The 
analyzer 302 has an analyze interface 308 to communicate 
with external components. The analyze interface 308 
receives a RGB format image as input 314 and generates a 
feature vector as output 316. The comparator 304 has two 
interfaces, a weights and scores interface 310 and a compare 
interface 312. The weights and scores interface 310 com- 
municates with a management function 318 handled by the 
application. The compare interface 312 receives two feature 
vectors in, a target feature vector 320 and a feature vector 
322 for the current image being tested or compared. Asso- 
ciated with the extensible VIR engine 300 are a set of 
primitives. A developer can specify a set of primitives that 
are to be used for a particular image domain. The extensible 
VIR engine 300 includes four universal or default primi- 
tives: local color 330, global color 332, texture 334, and 
structure 336. The developer may choose to use one or any 
number of these universal or primitives for his application. 
In addition, the developer may define one or more custom 
primitives and register the primitives with the primitive 
registration interface 306. The process of registering new 
custom primitives will be further described here inbe low. 

Referring now to FIG. 7, an exemplary VIR system 
utilizing the extensible VIR engine 300 will be described. 
The extensible VIR engine 300 communicates with the user 
102 through a user interface 350. The user interface 350 may 
include modules such as the Query Canvas module 118 and 
the Image Browsing module 110, which were previously 
described in conjunction with FIG. 1A. The extensible VIR 
engine 300 also is in communication with persistent storage 
132 through a database interface 130. The database interface 
130 is typically a database engine such as previously 
described above. An application developer has complete 
freedom in defining the user interface 350 and the database 
interface 130 to meet the needs of the particular domain at 
issue. 

Referring to FIG. 8, an operational flow 360 of the 
extensible VIR engine 300 will now be described. The 
engine flow 360 is invoked by an application such as the 
example shown in FIG. 7. Beginning at a start state 362, the 
engine moves to process 364 to register one or more 
primitives through the primitive registration interface 306 
(FIG. 6). Process 364 will be further described in conjunc- 
tion with FIG. 13. In typical operation of the extensible VIR 
engine 300, the user will provide a query object, such as 
through use of the Query Canvas 108 (FIG. 5A) or by 
browsing the file system 110 to identify the query object. 
Moving to a run analyzer process 366, a query object is 
analyzed by the analyzer 302 (FIG. 6) to create a feature 
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vector for the query image. Proceeding to state 368, the user 
typically provides or sets weights through the user interface 
350 (FIG. 7). Moving to a run comparison process 370, the 
comparator 302 (FIG. 6) determines a similarity score for 
the two feature vectors that are passed to it. The compare 
operation is typically performed on all the images in the 
database 132 unless a database partition has been identified 
or another scheme to compare or test only a portion of the 
images in database 132 is established. Once all the images 
have been compared by the run comparison process 370, the 
engine moves to end state 372 and control returns to the 
calling application. 

Referring to FIG. 9, another embodiment of a VIR system 
utilizing the extensible VIR engine 300 will now be 
described. As previously described in conjunction with FIG. 
5 A, several methods of generating a query have been shown. 
One of these methods includes the query generation and 
Query Canvas method 242/108, whereby the user draws or 
sketches a query image or modifies an existing image. 
Alternatively, the user may browse the file system 390 to 
identify an object or image to be used as the query 314. The 
query object 314 is passed onto the analyzer 302 for analysis 
to generate a feature vector 316 for the query. The feature 
vector 316 is sent to the database engine 130. Generally, the 
feature vector for the query image is only needed tempo- 
rarily to process the query. The query feature vector is 
usually cached in random access memory (RAM) associated 
with the database engine 130, for the query operation. For 
some database implementations, the query feature vector is 
placed in a temporary table by the database engine 130. 

A feature vector for the query target 320 and a feature 
vector 322 for one of the images in the database store 132 
are retrieved by the database engine 130 and sent to the 
comparator 304 for comparison. At the comparator 304, a 
thresholding decision 394 is checked to determine if thresh- 
olding is to be applied to the comparison method. If not, a 
standard comparison 396 will be performed utilizing the 
weights 400 as set by the user 102 (FIG. 1A). The standard 
comparison 396 will be further described in conjunction 
with FIG. 11. If thresholding is desired, the comparison will 
be performed by the threshold comparison process 398 also 
utilizing the weights 400. The threshold comparison 398 will 
be further described in conjunction with FIG. 12. A simi- 
larity score 324 is output by either the threshold comparison 
398 or the standard comparison 396. The similarity score 
324 is utilized by the calling application for use in presenting 
the resultant images. Presentation may be putting thumb- 
nails in a ranked order, for example. 

Referring to FIG. 10, the analysis performed by the run 
analyzer process 366 (FIG. 8) will now be described. Recall 
that a schema is a collection of primitives defined by a 
developer or application programmer. These primitives may 
include some or all of the universal primitives built into the 
VIR engine and any custom primitives defined by the 
developer for a schema. Also recall that each custom primi- 
tive must have an analysis function and a comparison 
function, and the primitive is registered through the primi- 
tive registration interface 306 (FIG. 6). These functions 
along with the analysis and comparison functions for the 
universal primitives are all stored in a lookup table for the 
schema. 

The process 366 takes as input an image and provides as 
output a feature vector. Beginning at a start analysis state 
410, the analysis process 366 moves to a state 412 to 
construct a header for the feature vector. A schema ID for the 
object or image that is to be analyzed is an input to the 
construct header state 412. The schema ID is obtained from 



L5,250 

22 

the schema creation process described in conjunction with 
FIG. 13. The user identifies the schema to be used for 
analysis of the visual objects through the application pro- 
gram. Using the schema ID, the corresponding schema or 

5 lookup table structure is accessed which lists the respective 
primitives and functions. There is one individual lookup 
table per schema. Accessing the first primitive in the lookup 
table for the schema at state 414, the analysis process 366 
proceeds to state 416 and looks up the analysis function for 

10 that primitive in the schema lookup table. Proceeding to 
state 418, the analysis function for the current primitive is 
called and the analysis function is performed. The input to 
the analysis function at state 418 is the image to be analyzed 
including its height and width characteristics. The output of 

is state 418 is the feature data for the current primitive which 
is placed in the feature vector under construction. Any of 
various statistical techniques are used in the analysis func- 
tion for the current primitive. For example, histogramming 
could be used, such as a color histogram. As another 

20 example, a mean intensity primitive could be defined as the 
sum of the intensity of all the pixels in an image divided by 
the number of pixels in the image. 

These techniques are well-known by those skilled in the 
relevant technology. Proceeding to decision state 420, the 

25 analysis process 366 determines if there are additional 
primitives in the current schema that need to be processed. 
If so, the analysis process 366 moves back to state 414 to 
access the next primitive in the current schema. If all the 
primitives in the current schema have been processed, the 

30 analysis process proceeds to state 422 to finalize the feature 
vector for the current image. At state 422, the analysis 
process 366 computes the total resulting size of the feature 
data and updates the size in the header for the feature vector. 
In another embodiment, checksums are also computed at 

35 state 422. The complete feature vector contains the header 
information and the feature data for each of the primitives in 
the schema. The analysis process 366 completes at a done 
state 424. 

Referring now to FIG. 11, the standard comparison pro- 

40 cess 396 shown in FIG. 9 will be described. In a manner 
similar to the analysis process 366 previously described, a 
comparison function for each custom primitive must be 
registered through the primitive registration interface 306 
(FIG. 6). The registered comparison functions are stored in 

45 the schema lookup table. The input utilized by the standard 
comparison process 396 includes two feature vectors to be 
compared and weights for each primitive. If the primitives 
for each of the two feature vectors are the same, the standard 
comparison is considered to be a homogeneous comparison. 

50 However, if each of the two feature vectors is associated 
with a different schema, but has at least one primitive in 
common between the two feature vectors, the comparison is 
considered to be a heterogeneous comparison. As will be 
seen below, the standard comparison process 396 accom- 

55 plishes either type of comparison. 

Beginning at a start comparison state 440, the comparison 
process 396 moves to a state 442 to construct a score 
structure for the comparison. The score structure is initial- 
ized to be an empty score structure at this point. The score 

60 structure contains space for one score per primitive plus an 
overall score. Proceeding to state 446, the comparison 
process 396 accesses a primitive in feature vector 1 (FV1), 
which is associated with the first of the two images being 
compared by the comparison process. For instance, FV1 

65 may be the result of analyzing the target image. Moving to 
a decision state 448, the comparison process 396 determines 
if the primitive accessed in state 446 exists in feature vector 
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2 (FV2), which is associated with the second of the two mance benefits to be gained by exploiting the primitive 
images being compared. FV2, may, for instance, correspond architecture of the VIR engine to intelligently process the 
to a candidate image. If the same primitive does exist in comparison. Comparisons proceed by computing the primi- 
feature vector 2, the comparison process 396 proceeds to tive comparison scores for the most heavily weighted primi- 
state 450 to look up the comparison function, for the current 5 tives first, and trying to prove as soon as possible that the 
primitive in the schema lookup table for FV1. Continuing at threshold has been exceeded. If the threshold is exceeded, 
state 452, the feature data associated with the current primi- the rest of the primitive comparisons are then aborted, 
tive from both feature vector 1 and feature vector 2 is Similar to the standard comparison process 396, previously 
unpacked. Recall that each feature vector is a concatenation described, two feature vectors and corresponding weights 
of feature data elements corresponding to the set of primi- 10 are input to the threshold comparison process. An additional 
tives in the schema. Advancing to state 454, the compare input is a threshold value, preferably in the range of 0 to 100. 
function accessed at state 450 is invoked and receives the The threshold comparison process 398 also performs both 
feature data unpacked at state 452. The result of calling and homogeneous compares and heterogeneous compares (as 
executing the compare function at state 454 is a primitive done by the Standard Compare). The threshold comparison 
score. An exemplary primitive having only one dimension or 15 process 398 can be performed on both the Base VIR Engine 
property is mean intensity. In this example, the distance or and the Extensible VIR Engine. However, the Base VIR 
primitive score between feature vector 1 and feature vector Engine may also perform a heterogeneous compare. In one 
2 could be (X1-X2). For primitives having multiple preferred embodiment, a heterogeneous compare can be 
dimensions, such as texture which may have as many as 35 performed only if at least one of the schemas utilizes a 
dimensions, the presently preferred embodiment uses a 20 subset of the default primitives. 

Euclidean metric. An equation for an exemplary Euclidean Beginning at a start comparison state 480, the threshold 

metric is as follows: comparison process 398 proceeds to state 482 to construct a 

score structure for the comparison. The score structure is 
y/2 initialized to be empty. Continuing at a state 484, the 

5 i = £ (FVy tO" FV2( /» 2 25 primitives of feature vector 1 (FV1), in the presently pre- 

J > ferred embodiment, are ordered by weights, with the highest 

weighted primitive ordered first and the lowest weighted 
Other techniques to determine the primitive score, such as P rimitive ordered last. A cost is optionally associated with 
histogram intersection or other histogram techniques, may each primitive to further order the primitives. The costs were 
be used in other embodiments. 30 previously described in the query optimization description. 

Moving to state 456, the primitive score or feature score cost value and the weight can be combined by a 

is placed into the score structure constructed at state 442 developer-defined function to order the primitives. For 
above. Continuing at a decision state 458, the comparison exam P le > the fanctl0n couW b f. multiplication. As another 
process 396 determines if there are additional primitives in exam P k > * th t ^ are ™^ed to [0 .1] beforehand, 
feature vector 1 that need to be processed. If so, the 35 a Manmum function can be used as follows. Max((1.0- 
comparison process 396 moves back to state 446 to access «*0. J" another embodiment, only the costs are 

the next primitive in feature vector 1. A loop of states 446 ™£ to °«? er mc pnnMtives^ 

through 458 is performed until all primitives in feature Proceeding to state 486 the : highest wetghted praitive in 

vector 1 have been processed. When decision state 458 fefj™ vecto f 1 15 accessed. Subsequent states 488 through 
determines that all primitives have been processed in feature 40 496 ■« «milar to states 448 through 456 of the standard 
vector 1, comparison process 396 proceeds to state 460 comparison process 396 shown in FIG. 11, and thus will not 
wheremthescoresstoredinthescorestructurearecombined b * descnbed m detail here. If the primitives of the two 
with the weights 400 (FIG. 9) for each of the primitives feature vectors are „ in J c ° mmo ?'^ e comparison function for 
passed into the comparison process to generate a final «° e P rtol ] ive * ^ < state 494 > and the P nmltlve 
combined score. The final combined score may be generated 45 computed and stored in the score structure at state 496. 
by a Linear combination or a weighted sum: Roving to state 498, a parLal final score is computed using 

the weights and the scores stored m the score structure so far. 
Moving to a decision state 500, the threshold comparison 
s l = }_j w i s t process 398 determines if the partial final score, also known 

50 as a weighted primitive score, exceeds the threshold passed 
into the comparison process 398. If the threshold has not 
The comparison process 396 completes at a done state 462. been exceeded, as determined at decision state 500, the 

Returning to decision state 448, if the current primitive comparison process 398 continues at a decision state 502 to 
that is accessed in feature vector 1 at state 446 does not exist determine if there are additional primitives to be processed, 
in feature vector 2, comparison process 396 moves down to 55 If there are additional primitives to be processed, threshold 
decision state 458 to determine if additional primitives exist comparison process 398 moves back to state 486 to access 
in feature vector 1, thereby bypassing calling the compare the next highest ordered primitive in feature vector 1. A loop 
function for the current primitive of feature vector 1. This of states 486 through 502 continues; until all primitives in 
allows feature vectors from different schemas to be com- feature vector 1 are processed unless the threshold has been 
pared but the comparison is only on primitives that are in 60 exceeded as determined at decision state 500. If the thresh- 
common between the feature vectors. If all the primitives old has been exceeded at decision state 500, the threshold 
between the two feature vectors are in common, the com- comparison process 398 aborts the loop, moves to done state 
parison will be done for each of the primitives and is a 506 and returns with an indication that the threshold has 
homogeneous comparison. been exceeded. 

Referring to FIG. 12, the threshold comparison process 65 Returning to decision state 502, if all primitives in feature 
398 previously shown in FIG. 9 will now be described. The vector 1 have been processed, threshold comparison process 
threshold based comparison 398 allows significant perfor- 398 moves to state 504 to determine a final combined score. 
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State 504 is optional if the score from state 498 has been code from the threshold compare process 398 is "normal" 

saved. If the score has not been saved, the final score is (ok). If so, query process 550 proceeds to a decision state 

computed using the scores stored in the score structure and 562 to determine if the number of results so far is less than 

the weights. The threshold comparison process 398 returns the desired number of results (C<N). If so, query process 

with a normal indication at the completion of state 504 and 5 550 moves to state 564 to add the score S ; returned from the 

completes at the done state 506. threshold compare process 398 to the query results list in an 

Referring to FIG. 13, a schema creation and primitive order that is sorted by score. The number of entries in the 

registration process 520 will be described. This logic is sorted results list thereby increases by one and has 

executed by the application. A developer may typically entries. Moving to state 566, query process 550 increments 

create a new schema for a certain domain of objects or 10 the result count C by one. Proceeding to a decision state 568, 

images. Examples of domains where new schemas may be the query process 550 determines if the number of results so 

created include face recognition, mammography, ophthal- far is equal to the desired number of results (C=N). If so, the 

mological images and so forth. As previously described, query process 550 advances to state 570 wherein threshold 

each custom primitive requires a primitive ID, a label, an T is set equal to the score (score^) of the N** (last) result in 

analysis function, a compare function, a swap (endian) is the sorted results list. The query process 550 continues at a 

function and a print function. This process 520 is a portion decision state 580 to determine if there are additional objects 

of the primitive registration interface 306 (FIG. 6). having feature vectors (FV,) in the database 132. If so, query 

Beginning at a start state 522, the schema creation process process 550 moves back to state 556 to access the next 

520 proceeds to state 524 to create a new schema. Creating feature vector in the database store 132. A loop of states 

a new schema is a function of the extensible VIR engine 300. 20 556-580 is executed until all the feature vectors in the 

The output of state 524 is a schema ID which allows the database store 132 have been processed, at which time the 

registered primitives to be identified. The results of state 524 query process 550 is finished at a done state 582. 

also include an empty schema structure, which includes the Returning to the decision state 568, if the value of C does 

schema ID. Moving to state 526, a primitive desired for this not equal the value of N, the query process 550 proceeds to 

schema is added to the schema structure. Adding the primi- 25 the decision state 580 to determine if there are additional 

tive to the schema is a function of the extensible VIR engine feature vectors to process, as previously described. The 

300. Moving to a decision state 528, the schema creation threshold T is not changed in this situation, 

process 520 determines if another primitive is to be added to Returning to the decision state 562, if the value C is not 

the current schema If so, process 520 moves back to state less than the value of N (i.e., ON), the query process 550 

526 to add the next desired primitive to the schema. When 30 continues at a decision state 572. At decision state 572, a 

all desired primitives have been added to the schema as determination is made as to whether the score S,- returned 

determined at decision state 528, schema creation process from the threshold compare process 398 is less than thresh- 

520 completes at a done state 530. At this point, a final old T (which is either the initialization value of 100 or the 

schema table identified by the schema ID and including all score of result N of the sorted results list set by either state 

the desired primitives has been created. The desired primi- 35 570 or state 578 in a prior pass of the process 550). If not, 

tives may include any custom primitives or any of the (i.e., S t - is equal to or greater than T) query process 550 

default or standard primitives, such as global color, provided proceeds to the decision state 580 to determine if there are 

in a library. additional feature vectors to process, as previously 

Referring to FIG. 14, the top "N" query process 550 will described. However, if the score S, is less than T, as 

now be described. The top N query is an exemplary usage of 40 determined at decision state 572, the query process 550 

the threshold comparison 398 by an application to provide a proceeds to state 574 wherein the new result score S, is 

performance gain. The top N query process 550 is used in a inserted into the results list sorted by score. At this time, the 

search where a fixed number of results **N" is desired and N results list temporarily has N+l entries. Advancing to state 

is known beforehand, e.g., N is provided by the application 576, the query process 550 deletes the last result (N+l) in the 

program. When N is small compared to the size of the 45 sorted results list. Moving to state 578, the 'query process 

database to be searched, the use of the threshold comparison 550 sets threshold T equal to the score (score^) of the new 

398 can result in a significant increase in speed of process- N** (last) result in the sorted results list. The query process 

ing. The inputs to this process 550 are the query target object 550 continues at the decision state 580 to determine if there 

to be searched against represented by its feature vector are additional objects having feature vectors (FV,) in the 

target* the weights for the primitives in this feature 50 database 132, as previously described, 

vector, and the desired number of results "N". Returning to the decision state 560, if the return code from 

Beginning at a start state 552, query process 550 moves the threshold compare process 398 is "threshold exceeded", 

to state 554 wherein initialization is performed: a query the score for the current feature vector is ignored and the 

results list is cleared to an empty state, a threshold variable query process 550 proceeds to the decision state 580 to 

"T* is set to be 100 (the maximum value of the preferred 55 determine if there are additional feature vectors to process, 

range [0 . . . 100]), and a result count variable "C" (the as previously described. 

number of results so far) is set to zero. The count C will be The output of the query process 550 is the sorted results 

in the range O^C^N. Proceeding to state 556, query process of the top N feature vectors. This output is sorted by score. 
550 accesses the feature vector FV, for the first object in the 

database store 132 (FIG. 9). The query process 550 then 60 IV * APPLICATIONS 

calls the threshold compare process 398 (FIG. 12) which is The VIR Engine directly implements the Visual Informa- 

a function of both the extensible VIR Engine 300 and Base tion Model previously described and acts as the hub around 

VIR engine 120. The feature vectors for the target object which all specific applications are constructed. The Engine 

{FV target) and me current object (FV,.) (from state 556) serves as a central visual information retrieval service that 

along with the primitive weights and the threshold T are all 65 fits into a wide range of products and applications. The 

passed in to the threshold process 398. Moving to a decision Engine has been designed to allow easy development of both 

state 560, the query process 550 determines if the return horizontal and vertical applications. 
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Vertical Applications 

tJecause the facility of content-based image retrieval is 
generic, there is a large potential for developing the VIR 
technology in several vertical application areas, such as: 

digital studio 

document management for offices 
digit a l libraries 
electronic publishing 

face matching for law enforcement agencies 
medical and pharmaceutical information systems 
environmental image analysis 
on-line shopping 
design trademark searching 
internet publishing and searching 
remotely sensed image management for defense 
image and video asset management systems 
^ visual test and inspection systems 

To explain why the VIR technology is a central element 
in these applications, let us consider some application pos- 
sibilities in detail. 
Environmental Imaging 

Environmental scientists deal with a very large number of 
images. Agencies such as NASA produce numerous satellite 
images containing environmental information. As a specific 
example, the San Diego Bay Environmental Data Reposi- 
tory is geared towards an . . . 

" . . . understanding of the complex physical, biological 
and chemical processes at work in the Bay ... it is 
possible to correlate these different kinds of data in both 
space and time and to present the data in a visual form 
resulting in a more complete picture of what is and 
what is not known about the Bay .... This is the kind 
of information that is required to assist decision makers 
in allocating scarce resources in more effective and 
informative monitoring programs by sharing data, 
eliminating redundant monitoring and reallocating 
resources to more useful and effective purposes. 
Another key component of this work is to provide all of 
these data and resultant analyses to the public-at-large 
. . . through the World-Wide -Web of the Internet." 
(From the San Diego Bay Project home page) 
For such applications, the methods are applicable to any 
geographic area in the world. Many of the datasets for 
environmental information are in the form of directly cap- 
tured or computer-rendered images, which depict natural 
(mostly geological) processes, their spatial distribution, and 
time progression of measurands. It is a common practice for 
environmental scientists to search for similar conditions 
around the globe, which amounts to searching for similar 
images. 
Medical 

A significant amount of effort is being spent in nation- 
wide health care programs for early detection of cancer. 
Image comparison is one of the fundamental methods for 
detecting suspicious regions in a medical image. 
Specifically, consider a cancer-screening center where a 
large number of fine needle aspiration cytology (FNAC) 
tests are conducted daily for breast cancer. We can envision 
a system that uses the system's image -similarity techniques 
to provide an intelligent screening aid for the practicing 
cytologist. After the slide is prepared, it is scanned by a 
camera-equipped microscope at different levels of magnifi- 
cation. At each magnification level, the slide is compared to 
a database of other slides (or an existing pre -annotated atlas) 
at the same magnification, and similarity is computed in 
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terms of cell density, number of nuclei, shapes of nuclei, and 
number of dividing cells. Suspicious regions of the slide are 
presented to the cytologist for closer inspection. If nothing 
suspicious is found, the system might suggest skipping the 

5 next higher level of magnification. The cytologist could 
always override the suggestion, but in general, it would save 
the cytologist the tedium of scanning through the entire 
slide, and thus increase his or her productivity. 

10 Multimedia 

Digital libraries of videos are becoming common due to 
the large number of sports, news, and entertainment videos 
produced daily. Searching capabilities for a video library 

15 should allow queries such as "show other videos having 
sequences like this one." If the query sequence has a car 
chase in it, the system should retrieve all videos with similar 
scenes and make them available to the user for replay. The 
basic technology to achieve this relies on detection of edit 

20 points (cuts, fade-ins, and dissolve), camera movements 
(pan and zoom), and characterizing a segmented sub- 
sequence in terms of its motion properties. Also needed is a 
smooth integration with a database system containing tex- 

25 tual information (such as the cast, director, and shooting 
locations), and other library facilities for which software 
products already exist. 

V. APPLICATION DEVELOPMENT 

30 

A present embodiment of the VIR Engine is delivered as 
a statically or dynamically linkable library for a wide variety 
of platforms (such as Sun, SGI, Windows, and Apple 
Macintosh). The library is database independent and con- 

35 tains purely algorithmic code with no dependencies on file 
systems, I/O mechanisms, or operating systems. The engine 
does not impose a constraint on the mechanism used to 
persistently store the image features. An application could 

40 manage the data using a relational database, an object- 
oriented database, or a simple file system approach. In this 
way, the VIR Engine is highly portable, and can be consid- 
ered for specialized processors and embedded applications. 
FIG. 7 shows the interaction between the Engine and other 

45 components of an end-user application. 

The VIR Engine is intended as an infrastructure around 
which applications may be developed. Image management, 
thumbnails, database interfaces, and user interfaces are the 
50 responsibility of the application developer. In particular, 
persistent storage of feature vectors is up to the application. 

The VIR architecture has been designed to support both 
static images and video in a unified paradigm. The infra- 
structure provided by the VIR Engine can be utilized to 
address high-level problems as well, such as automatic, 
unsupervised keyword assignment, or image classification. 

While the above detailed description has shown, 
described, and pointed out the fundamental novel features of 
60 the invention as applied to various embodiments, it will be 
understood that various omissions and substitutions and 
changes in the form and details of the system illustrated may 
be made by those skilled in the art, without departing from 
the intent of the invention. 

65 

A sample application template (example program) is 
provided as follows: 
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* example program 

* Description: Example program 

* This simple program exercises typical entry points in the Virage Image 
Engine API. 

* In particular, we illustrate: 

* - Creating an Image Engine 

* - Creating a default schema 

* - Creating a media object from an array of pixels 

* - Analyzing a media object to create a feature vector 

* - Creating and setting a weights structure 

* - Comparing two feature vectors to produce a distance 

* - Proper destruction of the above objects 

* Copyright (c) 1996 Virage, Inc. 



*•/ 

#include <stdlib,h> 
#include <stdio.h> 

#ifndef VIR__ENG_ENGINE_C_H 

#include <eng_engine c.h> 

#endif 

#ifndef VIR_VIRCORE_H 

#include <vircore c.h> 

#cndif 

#ifndef VIR_IMG_IO_C_H 
#include <img_io_c.h> 
#endif 

#ifndef VTR_IMG_PRIM_C_H 
#include <img prim , c.h> 
#endif 

#define WIDTH 128 

#defuie HEIGHT 128 

^define IMAGE1 "imaged' 

#define IMAGE2 "image2" 

#define G LOB AL_W EIGHT 1.0 

#define LOCAL^WEIGHT 0.5 

#dcf\nc TEXTURE_WEIGHT 0.3 

#define STRUCTURE_WEIGHT 0.6 

v ir_engPrim itivelD 

default_prirnitives[>{ VlR_GLOBAL_COLOR_ID, 
VIR_LOCAL_COLOR_ID, 
VIR_TEXTURE„ID, 
VIR__STRUCTURE_ID }; 

vir_float 

defcuilt_weightsf>{ G LO B AL_ W EI G HT, 

LOCAL_ WEIGHT, 

TEXTURE_WEIGHT, 

STRUCTURE_WEIGHT }; 
#dcfinc N_DEFAULX_WEIGHTS 4 
/• 

* This convenience function creates a vir mcd Media object from 

* a file which contains raw WIDTH x HEIGHT RGB (interleaved) data, 

* and then computes a feature vector for the object. The feature 

* vector (and its size) are returned to the caller. 

* For users of the Virage IRW module, there are numerous routines 

* for reading and writing standard file formats (ie. gif, jpeg, 

* etc.) directly to/from Virage vir_medMedia objects. 
•/ 

void 

CreateAndAnalyzeMedia( const char * filename, 
vir_engEngineH engine, 
vir_engSchemaH schema, 
vir_engFeature Vector Data ** feature, 
vir_engByteCount * count 

< 

vir Media H media; 

vir_bytc * data; 
vir_uint32 image_size; 
int bytes_read; 
FILE * fp; 

«.* + * + •«.****,.•*. .» >>>>:> Begyi Execution 

/* Open the file or raw pixels •/ 



05/16/2002, EAST Version: 



1.03.0002 



5,915,250 

31 32 

-continued 



fp = fopen(fUename,"rb"); 

if (fp — NULL) 

{ 

fprintf (stderr," Unable to open file %sVn",fi.lename); 
exit(-l); 

} 

image_size = WIDTH * HEIGHT * 3; 

/* Create a buffer to hold the pixel values */ 

data = (vir_byte *)malIoc(iinage__size); 

if (data = NULL) 

{ 

fprintffstdcrr, "Problems allocating data bufferVn"); 
exit(-l); 

} 

/* Read the pixels into the buffer and close the file */ 
bytes_read » fiead(data^tzeof(vk_byte),image_size,fp); 
fclose(fp); 

if (bytes read != image_sizc) 

{ 

fprintf (stderr," Problems reading file %s\n",filename); 
exit(-l); 

} 

I* Create our media object from the buffer */ 

if ( vir_imgCrcatcImagcFromData( WIDTH, HEIGHT, data, &mcdia ) != VTR_OK ) 
{ 

fprintf (stderr, "Problems creating image\n"); 
exit(-l); 

} 

/* Free the data buffer. The media object has made a private copy */ 
free(data); 

/* Now we analyze the media object and create a feature vector *l 

if ( vir_engAnalyze(engine, schema, media, feature, count) !=> VTR„OK ) 

{ 

fprintf (stderr, "Problems analyzing imagelAn"); 
exit(-l); 

} 

/*" Now that we are done with the media object, we destroy it */ 

if (vir_Destroy Media (media) != VIR_OK ) 

{ 

fprintf (stderr, "Problems destroying media\n'*); 
exit(-l); 

} 

} 

int 

main(int argc, 
char * argvfj ) 

{ 

vir engFeatureVectorData * featurel; 

vir engFeatureVectorData * fcaturc2; 

vir_engByteCount countl; 
vir_engByteCount count2; 
vir_engEngineH engine; 
vir_engSchemaH schema; 
vir_float distance; 
vir_engWeightsH weights; 

l+ + m* + i,m + + m* + + * + m++ + *m >>>>> Begin Execution + + + + *( 

I* We create a default image engine */ 

if ( vir„imgCreatcImageEnginc( &engine ) != VIR_OK ) 

{ 

fprintf(stderr, "Problem creating image engine\n"); 
cxit(-l); 

} 

/* We create a default image schema */ 

if ( vir_imgCreateDefaultSchema( vir_DEFAULT_SCH£MA_20, engine, &schema ) 
!= VTR__OK ) 
{ 

fprintf(stderr, "Problems creating schema\n"); 
exit(-l); 

} 

/* Now we'll use our convenient function to create feature vectors 

* We don't bother checking return codes ~ the function bombs out 

* on any error condition... 
*/ 

Create AndAnaIyzeMedia([MAGEl, engine, schema, & featurel, &countl); 
CreateAndAnalyzeMedia(IMAGE2, engine, schema, &feature2, &count2); 
/* 

* Now [ have the feature vectors in hand — in a real application I might 

* choose to store them persistently — perhaps as a column in a relational 

* database, as part of an object in an OODB, or as part of the header of a 

* file format. In this toy example, we'll just compare these vectors 
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against 

* each other and print out the visual distance between the images that they 

* represent... not very interesting, but illustrative at any rate. 
•/ 

/* Create a weights structure. We initialize the weights to some arbirtrary 

* values which we have #define*d above. In a real application, we would 
probably 

* get these weights from a user interface mechanism like a slider, but 
again, 

* this is just to illustrate the API... 
*/ 

if ( vir_cngCreateAridInitializeWeights( default ..primitives, 
default_weights, 
N_J>EFAULT_WEIGHTS, 
& weights ) ) 

{ 

fprintf(stderr, M Problems setting / normalizing weights\n"); 
cxil(-l); 

} 

printf( "Starting 500000 \n"); 

for ( int ii - 0; ii < 500000; ii++ 
{ 

vir engCompare( engine, feature 1, feature2, weights, &distance ); 

printf ( "Done.\n*'X 

/* Finally, we'll compare the two feature vectors and print out the 
distane! */ 

if ( vix_engCompare( engine, featurel, feature2, weights, &distance ) != 
VIR . OK ) 
{ 

rjprintf(stderrV"Problems comparing the images\n"); 
exit(-l); 

} 

fprintf(stdout,"The distance is %f!\n",distance); 
f* We're done with the feature vectors */ 

if ( (vir_engDestroy Feature VectorData(featurel) !- VTR_OK ) || 
(vir_engDestroyFeatureVectorData(feature2) != VTR_OK ) ) 

{ 

fprintf(stderr, i * Problems destroying feature vectoAn"); 
exit(-l); 

} 

/* Clean up the schema */ 

if ( vir_cngDestroySchema(schema) != VIR_OK ) 
{ 

£printf(stderr,"Problems destroying the schemata"); 
cxit(-l); 

} 

/* Clean up the engine */ 

if ( vir_engDestroyEngine(engine) != VIR_OK ) 
{ 

fprintf(stderr,"Problems destroying the engineta"); 
cxit(-l); 

} 

return 0; 



What is claimed is: 

1. A method of visual object comparison for a database of 50 
visual objects, comprising the steps of: 

a) applying primitives to a first visual object to extract a 
first feature vector, each primitive providing at least 
one primitive value to the first feature vector; 

b) applying primitives to a second visual object to extract 55 
a second feature vector, each primitive providing at 
least one primitive value to the second feature vector; 

c) providing an ordering value for each primitive to order 
the primitives; 

d) comparing one of the primitive values from the first 60 
feature vector with the corresponding primitive value 

of the second feature vector according to the ordering 
so as to obtain a primitive score; 

e) applying a primitive weight to the primitive score to 
determine a weighted primitive score; 65 

f) summing the weighted primitive score into a summed 
total score; and 



g) repeating steps d-f until the summed total score crosses 
a selected threshold. 

2. The method defined in claim 1, wherein the repeating 
step alternatively repeats until all primitives in one of the 
feature vectors have been processed to produce a final score. 

3. The method defined in claim 1, wherein the ordering 
value is the primitive weight corresponding to each primi- 
tive. 

4. The method defined in claim 1, wherein the ordering 
value is a cost associated with the execution time of the 
primitive. 

5. The method defined in claim 1, wherein the ordering 
value is a combination of the primitive weight correspond- 
ing to each primitive and a cost associated with the execu- 
tion time of the primitive. 

6. The method defined in claim 1, wherein the order of the 
primitives is defined by a function which orders the primi- 
tives from least ordering value to greatest ordering value. 

7. The method defined in claim 6, wherein the function 
comprises multiplication. 



55 
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8. The method defined in claim 6, wherein the function is 
defined as maximum((1.0-cost), weight). 

9. A software system for visual object comparison of a 
database of visual objects, the system comprising: 

means for applying primitives to a first visual object to 
extract a first feature vector, each primitive providing at 
least one primitive value to the first feature vector; 

means for applying primitives to a second visual object to 
extract a second feature vector, each primitive provid- 
ing at least one primitive value to the second feature 
vector; 

means for providing an ordering value for each primitive 

to order the primitives; and 
means for thresholding including: 

a) comparing one of the primitive values from the first 
feature vector with the corresponding primitive 
value of the second feature vector according to the 
ordering so as to obtain a primitive score, 

b) applying a primitive weight to the primitive score to 
determine a weighted primitive score, 

c) summing the weighted primitive score into a 
summed total score, and 

d) repeating a-c until the summed total score meets a 
selected threshold. 
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10. A program storage device storing instructions that 
when executed by a computer perform a method for a 
threshold based visual object comparison of a database of 
visual objects, the method comprising: 
5 a) applying primitives to a first visual object to extract a 
first feature vector, each primitive providing at least 
one primitive value to the first feature vector; 

b) applying primitives to a second visual object to extract 
a second feature vector, each primitive providing at 

10 least one primitive value to the second feature vector; 

c) providing an ordering value for each primitive to order 
the primitives; 

d) comparing one of the primitive values from the first 
feature vector with the corresponding primitive value 
of the second feature vector according to the ordering 
so as to obtain a primitive score; 

e) applying a primitive weight to the primitive score to 
determine a weighted primitive score; 

2 q f) summing the weighted primitive score into a summed 
total score; and 
g) repeating steps d-f until the summed total score meets 
a selected threshold. 

* * * * * 
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